I am the Associate Director for the Master of Science in Data Science for Public Policy program (DSPP), an Assistant Teaching Professor in the McCourt School of Public Policy at Georgetown University, and a Computational Social Scientist on the Core Data Science team at Facebook.
My research interests lie at the intersection of political conflict, network science, and geographic information systems. My work examines the distribution and impact of political violence, extremism, and conspiracy theories on political and social systems.
As a computational social scientist, I develop, utilize, and teach computational tools to help (a) effectively utilize computational methods to draw descriptive and causal inferences from data and (b) leverage non-traditional data assets to better understand social processes.
Growth in event datasets is fostering research about patterns, dynamics, causes, and consequences of conflict. Studies typically rely on a single dataset. Instead, we advocate integrating multiple datasets to improve measurement and analysis. We have generated an integrated dataset covering all violent events for Africa from 1997-2018 from three leading datasets (ACLED, UCDP-GED, and GTD). Our approach involves both pre-processing the data so that they are comparable and using an automated approach to produce an integrated dataset that is transparent and reproducible. Through examining these integrated data, we find substantial overlap across these three datasets. At the same time, each dataset includes events that conceptually should be captured in the other datasets, but are not. Thus, we view these integrated data as offering a better measure of violent conflict. A statistical analysis shows that geographic features frequently used in analyses of the location of conflict events — including the distance from the capital or a border, terrain, economic development, and population–have different effects on the incidence and frequency of conflict events when using integrated data as compared to individual datasets. These illustrations highlight the potential for integration to advance conflict research by yielding a more complete and accurate picture of activity, which has repercussions for both descriptive and theoretical findings. Integration is likely to be increasingly worthwhile as event datasets proliferate, expand in coverage, and exhibit wider applications.
Why do some violent non-state actors (NSA) regularly innovate while others do so rarely? Recent studies suggest variation in affiliation, bureaucratization, support, and competition yield different innovative capabilities. While emphasizing constraints, these studies tend to overlook the importance of internal drivers that make innovation more or less likely. I develop a theory of membership diversity in an NSA as an internal driver of innovation. Using panel data on 187 NSAs (1970 to 2018), I exploit variation in novel exposures to socially-relevant ethnic populations as a diversity treatment in an intent-to-treat design to estimate the relationship between membership composition and tactical innovation. I demonstrate the validity of the empirical strategy using a directed acyclic graph. The analysis finds that the diversity treatment increases both the likelihood and rate of tactical innovation. By treating membership composition as an information problem, the findings underscore the importance of knowing who is in an NSA when evaluating organizational capabilities.
How does gender influence violent behavior? Existing research generally focuses on biological and contextual factors that drive variation in violence, often overlooking how internalized gender norms can influence violent actions. Isolating the effect of norms from biology is challenging because sex and gender are typically conflated. Moreover, it is difficult to observe and know if individuals behave the same publicly as opposed to privately. To get around these issues, we examine a novel multi-player computer game setting where players can operate characters of varying genders, thus holding biology constant. The data tracks more than 488,000 unique players from over 150 countries for their first 30 days of gameplay. By exploiting variation in game mechanics, we find evidence that behavioral differences are attributable to internalized norms as opposed to biology or external sanctioning. We then leverage a natural experiment in the data to examine if these internalized norms can be altered. We find that both male and female players utilize their female characters more violently when exposed to examples of females in military roles. The project is the first to distinguish the effects of gender norms from the effects of biology or social sanctioning—confirming the importance of gender norms while clarifying the differences these norms exert on men and women’s behavior.
We develop a method to identify when states change their foreign policies based on an observable indicator: patterns of elite diplomatic meetings. We argue that elites choose diplomatic partners to advance a specific foreign policy agenda. When that agenda changes so do the incentives to choose diplomatic partners. To locate these breaks we apply non-parametric structural break tests to time series models that pre- dict a state’s diplomatic behavior. We argue that where these tests identify breaks in diplomatic behavior, a state has changed its foreign policy. We validate our theory using expert foreign policy analysis and quantitative cases. We first collect new daily diplomatic events data for Russia, Iran, the USA and Australia. We then com- pare structural breaks in these time series to expert assessments about foreign policy change. Consistent with expert reports, we locate structural breaks in Iran’s diplomatic behavior when Rouhani first comes to power, then in the months after sanctions are lifted; and in Russia’s diplomatic behavior 6 months before the Ukraine crisis, and then again when Russia extends its military into Syria. No break occurs in Australia’s or America’s diplomatic behavior as expected. We contribute to the empirical literature on conflict by providing new diplomatic data and a method to measure foreign pol- icy change, and to theories of diplomacy by linking aggregated patterns of diplomatic behavior to foreign policy choices—not underlying intentions.
The growing multitude of sophisticated event-level data collection enables novel analyses of conflict. Even when multiple event data sets are available, researchers tend to rely on only one. We instead advocate integrating information from multiple event data sets. The advantages include facilitating analysis of relationships between different types of conflict, providing more comprehensive empirical measurement, and evaluating the relative coverage and quality of data sets. Existing integration efforts have been performed manually, with significant limitations. Therefore, we introduce Matching Event Data by Location, Time and Type (MELTT) — an automated, transparent, reproducible methodology for integrating event data sets. For the cases of Nigeria 2011, South Sudan 2015, and Libya 2014, we show that using MELTT to integrate data from four leading conflict event data sets (Uppsala Conflict Data Project–Georeferenced Event Data, Armed Conflict Location and Event Data, Social Conflict Analysis Database, and Global Terrorism Database) provides a more complete picture of conflict. We also apply multiple systems estimation to show that each of these data sets has substantial missingness in coverage.
Do firm founders from nations with more predictable and transparent institutions allocate more autonomy to their employees? A cultural imprinting view suggests that institutions inculcate beliefs that operate beyond the environment in which those beliefs originate. We leverage data from a multiplayer online role-playing game, EVE Online, a setting where individuals can establish and run their own corporations. EVE players come from around the world, but all face the same institutional environment within the game. This setting allows us to disentangle, for the first time, cultural norms from the myriad other local factors that will influence organizational design choices across nations. Our main finding is that founders residing in nations with more predictable and transparent real world institutions delegate more authority within the virtual firms they create.
Does greater ethnic inclusion into the executive have a positive effect on a country’s economic development? We posit that by allowing for greater diversity in a state’s decision-making process, ethnic populations find their preferences represented and thus are more likely to support enacted policies; at the same time the quality of the policy increases as a greater variety of perspectives are introduced. Utilizing the new AMAR (All-Minorities at Risk) data to capture ethnic diversity, this article offers a preliminary description, suggesting that higher levels of inclusion positively correlate with indicators of economic growth.
This chapter offers insight into the utility of the latest release of Uppsala Conflict Data Program’s Georeferenced Event Dataset (UCDP-GED). The UCDP has an established record of compiling and disseminating an array of widely used data resources. The field of conflict studies, and the data that contributing scholars collect, have progressively moved toward greater specificity along several dimensions. UCDP-GED records the category of violence, the actors involved, the location and associated coordinates, and the timing of each event, as well as other characteristics. UCDP has been the source of the most widely used data in academic research on violence committed by organized armed actors. In particular, UCDP-GED provides a means for analyses to test micro-level theories. UCDP-GED has paved the way for methodological advances with a major bearing on substantive contributions to the literature.
2019 “Where a Founder Is from Affects How They Structure Their Company” (with David Waguespack and Johanna K. Birnir). Harvard Business Review.
I primarily teach graduate-level computational social science courses at Georgetown University. As an instructor, I try to balance substance with methodological rigor by training students how to effectively employ computational methods to investigate, analyze, and learn from data to formulate and test theoretically-relevant hypotheses. In my instruction, I match formal computational training with hands-on empirical examples so that quantitative methods are taught in the context where they are applied.
I aim to train students on how to: (i) utilize machine learning methods to explore and generate hypotheses from data; (ii) design and implement statistical designs geared toward effectively inferring causal relationships from observational and experimental data; (iii) synthesize disparate and unstructured data to draw meaningful insights from data related to public policy and political science inquiries; and (iv) visualize data to effectively communicate empirical findings. My goal is to train students to be effective consumers, critics, and producers of computational social science.
Course taught: Spring 2019, Spring 2020
This is the second course in the two-course sequence on quantitative methods for social science for the Masters of Science in Data Science for Public Policy (DSPP). The course builds on students’ understanding of multivariate regression and introduces advanced, but commonly used, methods of statistical analysis. The course is broadly divided into two part: advanced modeling and causal inference. Instruction will concentrate on how to determine the appropriate econometric approach in addressing various types of policy questions, while highlighting the challenges in isolating causal effects. The emphasis is on applied learning; formal proofs and mathematical rigor are presented but not the principal focus of the course. As part of our effort to teach effective communication skills, students will make presentations about applications using the techniques being studied in class.
Course taught: Fall 2018, Fall 2019
This first course in the core data science sequence for the Masters of Science in Data Science for Public Policy (DSPP) introduces students to the programming and mathematical concepts that underpin statistical learning. The aim of the course is to provide DSPP students with the foundations necessary to grasp the concepts and algorithms encountered in Data Science II and III. Students will cover topics related to linear algebra (with a focus on linear regression and dimension reduction); multivariate calculus (with an emphasis on optimization algorithms, specifically gradient descent); and probability theory (with an emphasis on simulation and sampling). Throughout the course, students will be introduced to the fundamentals of programming and manipulating data in Python. Students will work in Jupyter notebooks and use Git/GitHub to submit coding assignments, developing literate programming and reproducible research skills they will use throughout the program.
Course taught: Spring/Fall 2019, Spring 2020
This course teaches Masters of Public Policy (MPP) students how to synthesize disparate, possibly unstructured data in order to draw meaningful insights from data. Topics covered include fundamentals of functional programming in
R, literate programming, data wrangling, data visualization, data extraction (via web scraping and APIs), text analysis, and machine learning methods. In addition, students will be exposed to Git and Github for reproducible research. The course aims to offer students a practical toolkit for data exploration. The objective of the course is to equip MPP students with the skills to incorporate data into their decision-making and analysis.
I advise thesis projects for students in the Masters of Conflict Resolution program at Georgetown University.
melttmeltt — merging event data by location, time, and type — is an R package that offers a methodology for systematically integrating disparate geospatial event data by leveraging information on spatio-temporal co-occurrence and event-specific metadata.
tidysynthtidysynth is an R package that offers a tidy implementation of the synthetic control method. The package makes a number of needed improvements when implementing the method in R, allowing users a greater capacity to inspect, visualize, and tune a synthetic control model.
R”. (Data Science in Action Seminar) McCourt School of Public Policy, Georgetown UniversityR” (Short Course) Smith School of Business, University of Maryland, College ParkR: A short course on processing, analyzing, and visualizing data in R” (Short Course) Creative Associates International, Washington DCR” (Talk) University of Maryland, College ParkR” (Workshop) University of Iceland, ReykjavikR” (Workshop) University of Maryland, College ParkR” (Workshop) University of Maryland, College Parktidysynth Words Words Words Words Words Words Words Words Words Words Words Words Words Words Words Words Words Words Words Words Words Words Words Words Words Words Words Words Words Words Words Words Words.